In [1]:
import pandas as pd
import seaborn as sns
import plotly.express as px

import matplotlib.pyplot as plt
In [2]:
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

Matplotlib

For this excercise, we have written the following code to load the stock dataset built into plotly express.

In [3]:
stocks = px.data.stocks()
stocks.head()
Out[3]:
date GOOG AAPL AMZN FB NFLX MSFT
0 2018-01-01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2018-01-08 1.018172 1.011943 1.061881 0.959968 1.053526 1.015988
2 2018-01-15 1.032008 1.019771 1.053240 0.970243 1.049860 1.020524
3 2018-01-22 1.066783 0.980057 1.140676 1.016858 1.307681 1.066561
4 2018-01-29 1.008773 0.917143 1.163374 1.018357 1.273537 1.040708

Question 1:

Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.

In [4]:
#Facebook stock 
#Define x and y values 

x = stocks['date']
y = stocks['FB']

fig, ax = plt.subplots(figsize = (24,10))

ax.plot(
    x,y, color ='orange', 
    linestyle='dashdot', linewidth=1, 
    marker='o', markerfacecolor='g'
)


#Set all titles 
ax.set_title('FaceBook stock')
ax.set_xlabel('Date')
ax.set_ylabel('Stock value')
ax.set_xticks((10,20,30,40,50,60,70,80,90,100,110))

plt.grid()
plt.show()

Question 2:

You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.

In [5]:
#Import patches, needed for legend 
import matplotlib.patches as mpatches

#Define x and y 
x = stocks['date']
yGOOG = stocks['GOOG']
yAAPL = stocks['AAPL']
yAMZN = stocks['AMZN']
yFB = stocks['FB']
yNFLX = stocks['NFLX']
yMSFT = stocks['MSFT']

#Create plot 
fig, ax = plt.subplots(figsize = (24,10))

#Plot all different y-values
ax.plot(x, yGOOG, color='#6495ED', marker='o', markersize=4, markerfacecolor='black')
ax.plot(x, yAAPL, color='pink', marker='o', markersize=4, markerfacecolor='black')
ax.plot(x, yAMZN, color = 'orange', marker='o', markersize=4, markerfacecolor='black')
ax.plot(x, yFB, color = 'red', marker='o', markersize=4, markerfacecolor='black')
ax.plot(x, yNFLX, color = 'purple', marker='o', markersize=4, markerfacecolor='black')
ax.plot(x, yMSFT, color = 'blue', marker='o', markersize=4, markerfacecolor='black')

#Set titles
ax.set_title('All stock values')
ax.set_xlabel('Date')
ax.set_ylabel('Stock value')

#Set ticks
ax.set_xticks((10,20,30,40,50,60,70,80,90,100,110))

#Incl legend, use patches
lightblue = mpatches.Patch(color='#6495ED', label='GOOG stock')
pink = mpatches.Patch(color='pink', label='AAPL stock')
orange = mpatches.Patch(color='orange', label='AMZN stock')
red = mpatches.Patch(color='red', label='FB stock')
purple = mpatches.Patch(color='purple', label='NFLX stock')
blue = mpatches.Patch(color='blue', label='MSFT stock')

ax.legend(handles=[lightblue, pink, orange, red, purple, blue])

#Plot grid for readability
ax.grid()

Seaborn

First, load the tips dataset

In [6]:
tips = sns.load_dataset('tips')
tips.head()
Out[6]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Question 3:

Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.

Some possible questions:

  • Are there differences between male and female when it comes to giving tips?
  • What attribute correlate the most with tip?
In [7]:
#Difference between male and female
#Create plot, with different colors for male and female
figure = sns.lmplot(x='total_bill', y='tip', data=tips, hue='sex', fit_reg=True)
figure.add_legend()

#Based on the plotted regression lines in the figure, there is not much difference between male and female
#However, look into statistics for valid conclusion
Out[7]:
<seaborn.axisgrid.FacetGrid at 0x7f78a5753d00>
In [8]:
#Create column relative tip
tips['relative'] = tips['tip']/ tips['total_bill']

#Question: More relative tips on weekends? 
sns.catplot(data=tips, x='day', y='relative', kind='swarm')

#More extreme values on sunday, but not really a clear difference
Out[8]:
<seaborn.axisgrid.FacetGrid at 0x7f789fe4a610>
In [9]:
#Question: is there a difference in relative tip between lunch and dinner? On different days? 
sns.catplot(data=tips, x='day', y='relative', hue='time', kind='box')

#Only on friday we can really compare lunch and dinner, from plot it seems tips for lunch were higher
#For accurate answer better statistical analysis needs to be done
Out[9]:
<seaborn.axisgrid.FacetGrid at 0x7f78887a8fd0>
In [10]:
tips.head()
Out[10]:
total_bill tip sex smoker day time size relative
0 16.99 1.01 Female No Sun Dinner 2 0.059447
1 10.34 1.66 Male No Sun Dinner 3 0.160542
2 21.01 3.50 Male No Sun Dinner 3 0.166587
3 23.68 3.31 Male No Sun Dinner 2 0.139780
4 24.59 3.61 Female No Sun Dinner 4 0.146808

Plotly Express

Question 4:

Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.

The stocks dataset

Hints:

  • Turn stocks dataframe into a structure that can be picked up easily with plotly express
In [11]:
#Stocks dataset (exercise 2)

dfstocks = px.data.stocks()
fig = px.line(dfstocks, x='date', y=['GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'])
fig.show()

The tips dataset

In [12]:
#Tips dataset (exercise 3)
#Difference male/female 

fig = px.scatter(tips, x='total_bill', y='tip', color='sex', trendline='ols')
fig.show()
In [13]:
#Question: More relative tips on weekends? 

figurebox = px.box(tips, x='day', y='relative')
figurebox.show()
In [14]:
#Question: is there a difference in relative tip between lunch and dinner? On different days? 
figurebox2 = px.box(tips, x='day', y='relative', color='time')
figurebox2.show()

Question 5:

Recreate the barplot below that shows the population of different continents for the year 2007.

Hints:

  • Extract the 2007 year data from the dataframe. You have to process the data accordingly
  • use plotly bar
  • Add different colors for different continents
  • Sort the order of the continent for the visualisation. Use axis layout setting
  • Add text to each bar that represents the population
In [15]:
#load data
df = px.data.gapminder()
df.head()

df2007 = df[df['year'] == 2007]

#Sum population per continent and make new dataframe 
df2007continent = df2007.groupby('continent').sum()
df2007continent = pd.DataFrame(df2007continent)

df2007continent.head()
Out[15]:
year lifeExp pop gdpPercap iso_num
continent
Africa 104364 2849.914 929539692 160629.695446 23859
Americas 50175 1840.203 898871184 275075.790634 9843
Asia 66231 2334.040 3811953827 411609.886714 13354
Europe 60210 2329.458 586098529 751634.449078 12829
Oceania 4014 161.439 24549947 59620.376550 590
In [16]:
#Create figure and make different colors for each bar
#Order according to population
figurebar = px.bar(df2007continent, x='pop', orientation='h', color=df2007continent.index)
figurebar.update_layout(yaxis={'categoryorder':'total ascending'})
figurebar.show()
In [ ]: